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Gravitational Waves (GWs) from the early universe and unresolved astrophysical sources are 
expected to create a stochastic GW background (SGWB). The GW radiometer algorithm is well 
suited to probe such a background using data from ground based laser interferometric detectors. 
Radiometer analysis can be performed in different bases, e.g., isotropic, pixel or spherical harmonic. 
Each of these analyses possesses a common temporal symmetry which we exploit here to fold the 
whole dataset for every detector pair, typically a few hundred to a thousand days of data, to only 
one sidereal day, without any compromise in precision. We develop the algebra and a software 
pipeline needed to fold data, accounting for the effect of overlapping windows and non-stationary 
noise. We implement this on LIGO’s fifth science run data and validate it by performing a standard 
anisotropic SGWB search on both folded and unfolded data. Folded data not only leads to orders of 
magnitude reduction in computation cost, but it results in a conveniently small data volume of few 
gigabytes, making it possible to perform an actual analysis on a personal computer, as well as easy 
movement of data. A few important analyses, yet unaccomplished due to computational limitations, 
will now become feasible. Folded data, being independent of the radiometer basis, will also be useful 
in reducing processing redundancies in multiple searches and provide a common ground for mutual 
consistency checks. Most importantly, folded data will allow vast amount of experimentation with 
existing searches and provide substantial help in developing new strategies to find unknown sources. 

PACS numbers: 


I. INTRODUCTION 

A large number of sources in the universe are expected 
to emit short and long duration Gravitational Waves 
(GWs) [1-3]. Though we have convincing evidence of 
the existence of GWs [4, 5], they have not yet been de¬ 
tected directly. A number of worldwide efforts are ongo¬ 
ing to detect GW signals of different kinds and at differ¬ 
ent wavelengths [6-11]. The advanced ground based laser 
interferometric detectors, AdvLIGO [6], AdvVIRGO [7] 
and KAGRA [8], are the most promising candidates for 
the first detection of individual GW events, e. g., sig¬ 
nal from the coalescence of a compact binary. The de¬ 
tectors, however, will be sensitive to individual sources 
only from the nearby universe, perhaps up to few hun¬ 
dred Mpc to few Gpc [12], depending on the nature 
and strength of the sources. The rest of the sources 
in the universe, namely the faint and the distant ones, 
are likely to create a stochastic gravitational wave back¬ 
ground (SGWB) [ .3, 14]. Early universe processes like 
inflation and phase transitions are also expected to con¬ 
tribute to the background [15-17]. If the background is 
dominated by the high redshift universe, it will likely be 
more isotropic. However, the collections of faint sources 
in the local universe, e. g., the huge population of milli¬ 
second pulsars in nearby galaxy clusters [18], may create 
a stronger astrophysical foreground [19] which will then 
make the net background predominantly anisotropic. 
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Detection of a SGWB is one of the most important 
focus areas of current observational astronomy [20]. It 
would not only provide a “smoking gun” test of inflation, 
but it can also provide information on average properties 
of various astrophysical objects not accessible to conven¬ 
tional electro-magnetic astronomy. Several upper limits 
have been placed on SGWB at different frequencies using 
data from a wide class of experiments, e.g., LIGO, Virgo, 
ALLEGRO, EPTA, WMAP, BICEP2, Planck [21-33]. 

Over the past three decades a robust algorithm to 
search for long duration SGWB, by cross-correlating data 
from pairs of ground-based GW detectors, has been de¬ 
veloped [34-42] and implemented on real data [22-25, 43]. 
Though these searches are performed for isotropic and 
anisotropic backgrounds in different bases, they can be 
cast as a single unified maximum likelihood (ML) based 
radiometer analysis. The algebra then clearly reveals a 
temporal symmetry that could be utilised to fold data 
to one sidereal day. Data in this context are complex 
time-frequency maps incorporating outputs of pairs of 
detectors, which are often available as intermediate prod¬ 
ucts of different SGWB searches. We can fold the time- 
frequency maps along the time axis modulo the sidereal 
day. Though in principle it is a simple scheme, the pro¬ 
cess becomes somewhat involved in order to account for 
overlapping Hanning windows used in data preprocessing 
to prevent spectral leakage [44]. 

We developed a parallel pipeline to implement folding 
on LIGO data. To test the method and the implementa¬ 
tion we apply the pipeline to LIGO’s fifth science run (S5) 
data. We validate the folded data by running a standard 
anisotropic SGWB search pipeline on it with nominal ap¬ 
proximations and comparing the skymaps with the same 
obtained from unfolded data. 
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This paper is organised as follows. We begin with a 
brief review of the radiometer algorithm for a network 
of detectors in section II. The methodology for folding is 
described in section III. Implementation and numerical 
results are presented in section IV. We conclude in sec¬ 
tion V with discussions on the method, the results and a 
list of important advantages that folded data can provide. 


II. GW RADIOMETER 

The GW radiometer algorithm [34-42] optimally 
probes either an isotropic or anisotropic SGWB and gen¬ 
erates an all-sky map. The algorithm uses earth rotation 
aperture synthesis imaging, as is often used in radio as¬ 
tronomy. The same GW signal arrives at different times 
at geographically separated detectors. If the strain data 
from two detectors are cross-correlated, accounting for 
the delay between the sites in receiving the signals from 
a given direction, the signals add coherently whereas the 
noise does not. Thus, upon integration over a reasonably 
long observation time, the signal cross-correlation grows 
faster than noise variance, making the detection statis¬ 
tic more and more significant. The rotation of the earth 
provides different orientations of the detectors and the 
baselines, which makes it possible to make a map of the 
sky even with only two detectors. 

The radiometer algorithm emerges naturally as the 
maximum likelihood solution to the probability of find¬ 
ing an anisotropy in the data from two detectors. A brief 
review of the algebra is given below. Based on this, it 
is straightforward to show how data can be folded for a 
radiometer search. 

Let us consider a two detector “baseline” I constituted 
by the detectors denoted by T\ and X 2 (calligraphic letter 
denotes detectors in the corresponding baseline). Time- 
series data s Xl2 (t) from the detectors are the sum of the 
signal h Xl2 (t) and the noise n Xl2 (t), 

SXi(t) = h Xl (t) + n Xl (t), (1) 

s Xo _(t) = h l2 {t ) + nx 2 (t). (2) 

It is more convenient to divide the data into smaller 
time segments and to work with their ‘Short-term Fourier 
Transforms’ (SFTs). The SFT of a time series s(t) for a 
segment of duration r is given by 

s(t;f) ■■= f t+T 2 dt's(t') e~ i2nft '. (3) 

Jt-r/2 

The argument t in s(t; /) is a timestamp to mark a spe¬ 
cific segment. Unless otherwise specified, we represent 
the frequency domain Fourier Transform of a time series 
by putting a ~ on top of the corresponding variable. 

Statistically the noise in a detector is expected to be 
uncorrelated with the GW strain in that detector, or the 


noise in another geographically well separated detector. 1 
Hence, one has 

{ni 1 , 2 {t-J)h Il ' 2 {t;f)) =0, (4) 

(' n Xl {t\f)n l2 (t-,f )) = 0. (5) 

Since a stochastic background is characterised by the 
second moment of the signal (power spectral density) and 
cross-correlation techniques are used to probe a signal, we 
start our analysis with the cross-power spectral density 
(CSD) of data from a baseline. CSD is defined as the 
product of the complex conjugate of SFT of one detec¬ 
tor’s data for a certain segment with the SFT of data 
from the other detector for the same segment. Thus, for 
a baseline /, the CSD (C 1 ) and the corresponding noise 
(n J ) in the small signal limit, (\h(t; f)\ 2 ) -C (|h(t;/)| 2 ), 
are given by 

C 1 = Cj t := r Xi (t-J)s X2 (t;f), (6) 

n J = n) t ■- n Xl (t;f)n X2 {t;f). (7) 

The expected instantaneous cross-power noise variance is 
given by 

a 2 Ift := (rifl n ft) = ^ (i; /) Pi 2 (i; /), (8) 

where P x 12 (t; f ) are the one-sided power spectral density 
(PSD) of noise n Xl 2 for segment t and r is the duration 
of a segment. 

An SGWB can be modeled by the expected shape of its 
frequency PSD H(f ) and its expected amplitude V(tt) 
in the direction of the sky fi. 2 One can expand the 
“SGWB skymap” P(ft) in any chosen basis e a (fl), such 
that, 

V(n) := Y J V o t e a (U). ( 9 ) 

OL 

So far in literature the following bases have been used for 
SGWB searches: e a (fl) = 1 (isotropic search, a = 0), 
e a (tt) = 5( £1 — f2 a ) (pixel basis, a is the pixel index) 
and e a (n) = Yim (spherical harmonic basis, a = Im). 
In general, in any basis, including the ones mentioned 


1 There can be small traces of correlated noise present in two geo¬ 
graphically separated detectors caused by global magnetic fields 
from Schumann resonances [45]. Methods have been proposed to 
mitigate those noises [46]. The same prescription can be followed 
for generation and use of folded data. 

2 Note that, so far SGWB search algorithms have been developed 
only for cases where the shape of the frequency spectrum H(f) 
is constant in every direction, only the amplitude V(fi) varies. 
In a more general scenario the shape of the power spectrum can 
also depend on direction fl. In principle, a targeted radiometer 
search can be applied in such cases, given a model for H(f ) for 
every direction, though it has not been tested yet. 
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above, the expectation of the CSD from the baseline I 
can be written as [42] 

(Cj t ) := rH{f) 5> a 7/ t ,a, (10) 

a 

where the general overlap reduction function 

a J s 2 

( 11 ) 

and A is the polarisation (+ or x). yj t a contains 
all the information about the antenna pattern functions 
Fj i2 (Cl,t) of the detectors, baseline separation Ax/(f) 

and the basis e a (f2). It is worth noting that for small 
enough segment duration (r ~ 1 minute), a varies 
insignificantly with t within a segment [43]. 

From the above discussions, combining Eqs. ( 6 ), (7) & 
(10), one can express the observed CSD C J from a given 
baseline I as a linear convolution of the true sky "P = V a , 

C 1 = K 1 V + n 1 , (12) 

through a deterministic kernel (a. k. a. the beam function 
in pixel basis) [see Eq. (10)] 

K 1 = Kj tia := tHU) 7^a, (I 3 ) 

and additive noise n 1 = n,j t with covariance (in the small 
signal limit) [see Eq. ( 8 )] 

•r 1 = Vftj't' ~ Co v(n I f t, nI f, tl )=S tt 'Sffiaj ft . (14) 

For the case of multiple (say Nt) baselines, one can com¬ 
bine the data, noise and the kernel as independent obser¬ 
vations respectively as 



/C 1 \ 


f n 1 \ 


K 1 ' 

C := 

c 2 

; n := 

9 

rr 

; K := 

K 2 


v ° Nh ) 


v niVb ) 


K Nb 


The noise covariance matrix still remains block diagonal, 
N = N lftJ , f , t , = (n Xl (i; f)n* l2 (i; f)n* x , (f'; f)n x . (/'; /')> 
= 8 n >5w5ff><j 2 Ift , 

(15) 

as the expectation vanishes if either X\ ^ X[ or X’ 2 7 ^X 2 . 
Combining the data vectors in this manner retains the 
single baseline form of the convolution equation, 

C = K V + n. (16) 

The objective of a search is to estimate the coeffi¬ 
cients V a from the CSDs from one or more baselines. 
A standard ML solution to the above convolution equa¬ 
tion, which provides an estimate of the true sky [41], can 
be written as 

(17) 


where X = X a is called the dirty map, 

X :=Kt • N- 1 • C =7 

ift 

4 v, -» „ „ (18) 

and T = Y aa i is the Fisher information matrix, 
r :=Kt • N- 1 • K =► r aa , = Kfta <*!■ ft Kftx 




H 2 (f) 


ift 
* T 


Ift 




(19) 


Thus computing X and T from data in a chosen basis is 
the essential goal of an SGWB search. 


III. FOLDING DATA TO ONE SIDEREAL DAY 
A. General Formalism 

The idea behind folding follows trivially from the for¬ 
mulae for the ML estimation presented in the previous 
section. Since we are applying earth rotation synthesis 
imaging, the kernel Kj t a has a period of one sidereal 
day (i.e. 23 hr 56 min 4 sec). This symmetry can be 
exploited to fold the non-periodic part, the entire data of 
several hundreds of days, to one sidereal day. 

Both the quantities needed for ML estimation of V a , 
the dirty map X a and the Fisher information matrix 
r aa /, involve summations over time segments marked by 
t. These summations can be split into two parts using 
t = *day x T s + t s , where iday is a dimensionless integer 
representing the sidereal day number in which a given 
t lies, t s is the remainder within that sidereal day and 
T s is the duration of one sidereal day. Equivalently, the 
summation over t in Eq. (18) and Eq. (19) can be broken 
into two summations, £ t -A £j day £ ta , where *day runs 
over of the total number of sidereal days for which data 
is being processed and t s runs over one sidereal day. This 
is schematically presented in Figure 1. 

Following the above convention, it is straightforward 
to introduce folding by rewriting Eq. (18) and Eq. (19) 
as 


X a = E K ft s ,a x ft s , 

( 20 ) 

Ift B 



( 21 ) 

Ifts 



where the quantities summed over all sidereal days, 


.1 
ft s 

^ ^/(iday^s + fs) ^/(^dayT' s +t s ) ’ 

( 22 ) 


^day 


’/ts 

= E a If(i^yT e +U) > 

(23) 


®day 


V a = V = T 1 X 
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FIG. 1: This diagram illustrates the folding scheme using 
three days of LIGO S5 data from GPS time 860832366 sec 
to 861090858 sec. Each point on the spiral represents one 
segment marked by a GPS time and its color represents the 
corresponding sidereal time. The projected ring at the bottom 
represents folded data. Missing points in the spiral and the 
projected ring represent missing data. Had we considered 
many more days of data in this plot, the gaps in the projected 
ring would have disappeared. As we will see, folding generates 
a ring for each frequency bin, that is, a time-frequency map. 


are the folded data , which span only one sidereal day. 
Thus, without any loss of generality or precision in deriv¬ 
ing the final results X a and r aa /, the long time-frequency 
maps Cj t (complex) and its variance erf f t (real) for each 
baseline are now compressed into one-sidereal-day-long 
time-frequency maps (complex) and vj ts (real). Note 
that non-stationarity of noise has automatically been 
taken into account in the above formulae through the 
inverse variance weights, t +t)' Eq. (23). 

Each segment of folded and unfolded data has the same 
number of complex and real vectors, hence data com¬ 
pression is exactly given by the ratio of the number of 
segments in unfolded and folded data. Computational 
speed up is also given by the same ratio, as it is roughly 
proportional to the data volume in a radiometer analysis. 

Since Eq. (22) and Eq. (23) do not involve any quantity 
with an index a, folded data is independent of the search 
basis. Hence, similar to the unfolded time-frequency 
maps, it is possible to perform different types of searches 
on folded data by making different choices of the SGWB 
spectrum H(f) and overlap reduction function a . 


B. Correction for Overlapping Window 

A smooth window function is often applied to the time 
series data from the detectors to reduce leakage from 
strong spectral lines in the SFTs. Applying such a win¬ 
dow naively, would, however, lead to effective loss of data, 
as major portion of quality data does not receive full 
weighting. To prevent this, the windows are made to 
overlap. A method was developed for using 50% overlap¬ 
ping Hanning windows in SGWB analyses [44] and was 
applied to LIGO-Virgo stochastic searches [22-24]. Ac¬ 
counting for overlapping windows makes the algebra for 
folding considerably involved, though the end result is 
fairly simple and elegant, which we present below. 


Overlapping windows introduce a correlation between 
SFTs from neighbouring segments, hence the noise co- 
variance matrix becomes tridiagonal, 

Niftj'ft> = S IP Sff-[8 U ' aj ft + 

a ft,f(t- 1) + fyt+i)*' a ftj(t+ 1)] • 


Note that, for brevity of notations t has been used here 
as an integer index to time segments, hence t ± 1 repre¬ 
sent respectively the next and the previous segments. To 
convert t to proper units of time, one needs to multiply 
it by the segment duration r. The non-zero off-diagonal 
components of the covariance of CSD, cr 1 = for 

50% overlap and nearly white noise are given by [44] 


a ftJ(t± l) 



— £ t ±l [°//t + I)]/ 2 ; ( 2 5) 

Wi if segment til exists for baseline I 
0 otherwise 


and the coefficient Wi depends on the functional form 
of the window function. If wxi(t) and wz 2 (t) are the 
window functions applied to the time series data from 
the respective detectors, the coefficient is given by 


w = wii (*) Wh {t + r/2) wi 2 (f) w l2 (f i t/2) 

2 (f) wf 2 (*) 

where the bar on the quantities denotes average over 
time. Since the window function remains the same for 
the whole stretch of the data, the averaging needs to be 
done over a single segment only. For the same Hanning 
window on both the data streams, this factor turns out 
to be Wi « 3/70. 

To proceed further we require the inverse of the covari¬ 
ance matrix N. A tridiagonal matrix of the form 

A = A t jj — oii Sij -f- if + B r 8i(j— i) ■ (27) 

such that /ctj\ <C 1, for all i,j, when multiplied to 
the matrix 


B = Bj k = — 


\ pi, 

Ojk d j(k + 1) 


PJ, 


(28) 


one gets, 


Ajj Bjk — 5ik + (9(|/3 ± /a'|"). (29) 
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Hence, matrix B can be regarded as the inverse of matrix 
A to first order in \f}±/a\. We use this to invert the 
matrix cr 1 to first order in Wi (accuracy ~ Wf ~ 0.2%). 
By substituting at = 0/y t , Pt = &f t f(t±i) an( i Eq. (25), 
one gets, 


-l 
ft,ft 


-l 

ft,f(t± i) 


l/a t . — (TjJt 

pf = 

OL t ott± i 


£ t± 1 r -2 


+ O 


J/(i±l)] 


( 30 ) 
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all other elements of [cr 1 ] 1 are zero. In turn, the inverse where s Xl 2 (t; /) are windowed SFTs. The variance then 
of the (block diagonal) matrix N [Eq. (24)] becomes, becomes 

[N]- 1 ee MF/Wt'* 5n'5 ff ,[5u'W}- f lf t + 


[E/Et-l) + S(t+ l)t' W] ft J( t +1)] • 


(31) 


U/t 




We now follow a convention often used in LIGO-Virgo 
analysis to avoid unnecessary factors arising from the 
Hanning windows in the expectation of the statistic. We 
redefine the CSD as, 


w Xl {t)w l2 it) 


PxAt;f) P l2 (t-f) . (33) 


c 7 = C/t == 


1 


All the other formulae above remain unchanged, as the 
quantities appearing in the final formulae have been ex- 


w Xl ( t)w l2 ( t ) 


s Xl (£; /) s x 2 (i; /)! (32) pressed in terms of Cj t and ajj t . 


Using these results, the expressions for dirty map X and Fisher information matrix T [Eqs. (18) and (19)] become, 

1 


X ° = E*7*,c 


i ft 


— 2 n I 

a ift L ft 2 


{ 


—2 —2 
a ift + a if(t~ 


i)} c f(t- 1) ~ £l 


2 °t+i \ u ift 


|cr rf " + a 


u(t+ 1 )} C’/b+i) 


(34) 


Fern' = E K flc 
lft 


-2 T, -I 1 

CT //t A ft,a' ~ - £ - 


I 

t—1 


a Ift + a If(t- 1) ^ A 


} A /(t_l), a ' 2 £ ‘ +1 { 


+<T //(i+l)} X f(t+ l)>a' '(^ 5 ) 


r 


and , have been introduced. The final expres- 


The above formulae apply to the standard (unfolded) uy 

radiometer analyses. We now write them in terms of sions for the three real folded data are given by 

folded data. We again partition the summation over t to 
summations over i<j a y and t s . The formulae for dirty map 
and Fisher information matrix, Eqs. (20) & (21), can still 
be cast in a way that they take simple forms, 


V fU 


X ° = E *&,«**. > 

Ifts 

Faa' = J2 K ftA K k^ v k 

Ifts 


(36) 

(37) 


A /(t s -l),a' uI ft e K f(ts+ l),a' W ft B ] > 

where the expressions for xf ta and Vf f , provided in 
Eqs. (22) & (23) for the unwindowed case, have been 
modified with correction terms and two more folded sets, 


u k 


U fts 


= E 1 

*day 


J r /(*dayT’s+^s) 5 


2 ^dayT s +t s -l ^ 


*day 


I -2 | -2 

J'/(iday T S+*s) a //(idayT a + t B -l) 


-El 


(38) 


2 U May^B+(s + l ^ 


-2 , -2 
^//(idayT.+tH,) "T" °//(idayT a +t B + l) 


and owe complex folded time-frequency map 


x k 


^ ^ If OdayT+^s) (tlay A 2 ^ d ayA+t a 1 ^ UOdayA+ts) A UOdayA+^s — 1) ^ ^/OdayU+^B 1) 


2 e *dayT a -(-t s + l j^//(zdayA+is) ^ If (^day A -(-£ B +1) J ^f Oday A +t B +1) 


(39) 


Although we made three real folded datasets (u, v, 
w) to ensure exact analysis, in practice, one set, a com¬ 
bination of them, can provide adequate accuracy. One 
can do this by replacing the kernels for the adjacent seg¬ 


ments in Eq. (37), Kju ±1 j a> by the kernel for the cen¬ 
tral segment, Kj t ^ a . This is possible not only because 
the kernels in neighbouring segments differ by ~ 10% 
at / ~ 1 kHz (lesser for smaller frequencies), provided 
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the segment duration is ~ 1 min, so that the earth has 
not turned significantly during the combined interval of 
~ 2 min spanned by three consecutive overlapping seg¬ 
ments, but also the terms Kf (f ±1) a are weighted by W/ 
in Eq. (37), so the error introduced by this approxima¬ 
tion is less than a percent. Then the expression for Fisher 
information matrix, Eq. (37), simplifies to, 

r ««' « E K J*U, a K k,« \ v k ~ u k - w fk ■ (40) 

Ift s 

Hence, instead of vj t and only one quantity, 

Vft e := v k ~ u k ~ w k > ( 41 ) 

suffices to provide adequate accuracy. In addition, if 
noise can be taken as sufficiently stationary, such that 
the neighbouring segments have nearly the same PSDs, 
one could further simplify the folded data as, 


V fts V fdj^yTv+tcz) ‘^dayT’s+is — 1 ^dayTs+£s + l) ’ 

^day 


I -2 

X fu - 2^ a if(Ma.yT e +t s ) 

WOdayTs+ta) 

^day 


nl 

^day^s+^s - 1 /(*day^s+^s _ 

— F 1 C 1 

-1) MayU+ts+1 f (i dayU+^s + 1) 


(42) 


These approximations cause insignificant (< 1%) dif¬ 
ferences in the final results, yet simplify the analysis 
pipeline. Hence, it is a common practice by the LIGO 
community to include them in radiometer analyses. How¬ 
ever, we emphasise that the folding method does not re¬ 
quire these approximations. If necessary, one can develop 
a modified pipeline to make use of all the folded datasets 
for a more exact analysis. 

It is also worth noting that the time-frequency maps 
[u^J 1 / 2 and can be used respectively as effec¬ 

tive noise PSDs and CSD of two detectors. This repre¬ 
sentation of folded data can be readily incorporated into 
standard analysis pipelines, as done in section IV. 


IV. IMPLEMENTATION AND RESULTS 

We have developed a code for folding data to one side¬ 
real day and implemented it on real data. To demon¬ 
strate the method and validate the results we apply a 
standard analysis code developed by the LIGO Scientific 
Collaboration to search for a stochastic background on 
both folded and regular unfolded data. For this purpose 
we have used LIGO’s fifth science run data from the Han¬ 
ford and the Livingston detectors. 

Data from the detectors are stored in 128 sec long 
frame files. Raw data are down-sampled from 16384 Hz 
to 2048 Hz. Then a high pass filter is used to allow fre¬ 
quencies above 32 Hz to reduce seismic noise. The SFTs 


of the data from the two sites are then combined to ob¬ 
tain the CSDs, which are written to a new set of frame 
files with a reduced set of metadata. These new frames 
are called Stochastic Intermediate Data (SID). Each file 
contains one SID frame of duration 52 seconds. This 
duration was chosen because one sidereal day is approxi¬ 
mately 86164 seconds long, which is an integral multiple 
of 52. A frequency resolution of A/ = 0.25 was used. The 
data was windowed to avoid spectral leakage of strong in¬ 
strumental lines with Hann function with 50% overlap. 

Our code folds the SIDs to Folded Stochastic Interme¬ 
diate Data (FSID). The FSID frames are written in such 
a format that the standard LIGO analysis pipeline can be 
directly used to analyse this data. To match the pipeline, 
we make use of the slightly approximate folded data given 
by Eq. (42) to create effective noise PSDs and CSD de¬ 
scribed above. Note that the FSID frames we generate 
also store the four folded datasets, u, v, w, x, needed 
to perform an exact analysis, if ever necessary, though it 
will require some changes in the existing pipeline. 

We first run the pipeline on SID frames for a certain 
observation period to search for an anisotropic SGWB in 
the spherical harmonic (SpH) basis [24]. The pipeline 
applies the same algebra that was presented in Sec¬ 
tion II and Thrane et al. [42]. We choose a source PSD 
H(f) = where the parameters f R and j3 are 

respectively the reference frequency and spectral index 
for the (assumed) power-law behaviour of the GW spec¬ 
trum. Although this validation study is independent of 
the parameters used in the analysis, we choose a set which 
is commonly used in LIGO-Virgo analyses [22-24]. We 
search for a flat spectrum by setting /? = 0 and f R = 100 
Hz (redundant for /? = 0). We choose a spherical har¬ 
monic multipole cut-off of Z max = 15. The processed 
output comes in the form of a dirty map and the Fisher 
information matrix for individual jobs, which can be com¬ 
bined and further post-processed to make a skymap. 

Next, we produce FSID by folding the same set of SID 
frames. The SpH search is then applied to the FSID 
frames following the identical procedure as that for the 
SID frames and a skymap was generated. The only dif¬ 
ference now is that we use far less computing power to 
process the data, as quantified below. The procedure 
followed for this validation is depicted in the flow chart 
shown in Figure 2. 

We perform the validation for ~ 10 calendar days, 
~ 100 calendar days and full ~ 2 calendar years worth 
of LIGO’s fifth science run (S5) data from Hanford (HI) 
and Livingston (LI) detectors. Computation time re¬ 
quired to fold the data was respectively ~ 0.2, 2, 9 CPU 
hours (depending on the load on the computer and stor¬ 
age devices at the time of the analysis) and the cost for 
performing the SpH analysis on the unfolded data are re¬ 
spectively ~ 1,10,48 CPU hours. Since folding is a one 
time affair, while radiometer analysis is run on the same 
data many times, computational cost for folding is prac¬ 
tically negligible. The cost of running the SpH analysis 
on folded data is nearly a constant , it takes only ~ 10 
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FIG. 2: The flowchart illustrates the structure of the Stochastic pipeline and the path followed for the validation of results 
from unfolded and folded data. The main point to note here is that the same analysis was performed on SID and FSID frames 
and the FSID frames were produced from the same set of SID frames. 


minutes on a single core of a 2.0GHz Intel Xeon E7-4820 
processor. Hence the computational cost saving is ~ 300 
times for one full analysis of the S5 data (because S5 has 
total ~ 300 days of usable cross-power data) and this 
factor will remain the same for every analysis, isotropic 
or directed, that one performs on the same set of data, 
e.g., with different choices of H(f ), angular resolution, 
frequency bands and masks. 

The comparison of results obtained from folded and 
unfolded data are presented in Figure 3 and Table I. 
Since the differences are larger for short durations of 
data, the figures are shown only for ~ 10 days’ data. In 
Figure 3, the dirty map (left) and its standard deviation 
map (right) for unfolded (row #1) and folded (row #2) 
data are shown. Row #3 of Figure 3 shows the differ¬ 
ence between the top two rows. Clearly, the maps from 
folded and unfolded data match very well and are visu¬ 
ally indistinguishable. We also plot the SNR (row #4) 
and deconvolved clean (row #5) maps from unfolded data 
(left) and their differences from the corresponding maps 
obtained from folded data (right). As can be seen from 


the colorbars, the differences are much smaller than the 
actual maps. 

The quantitative differences are presented in Table- 
1, which shows the fractional root-mean-square (RMS) 
differences between different maps for the three cases. If 
a result vector A is obtained from unfolded data and 
B is the corresponding result from folded data, we use 
the usual definition of fractional RMS difference, which is 
given by ||B — A||/||A||, where the norm of a A is defined 
as || A|| := V A'' ■ A. Here each component of the vectors 
A and B corresponds to a pixel index or a pair of l, m in 
the Spherical Harmonic basis. From Figure 3 and Table I, 
it is clear that the differences are much smaller than the 
standard deviation at each pixel (as evident from the 
SNR difference map) and, hence, can be ignored for all 
practical purposes. 

It is worth mentioning that the match between the 
skymaps from unfolded and folded data that we see here 
is not just a statistical match, it is an absolute match. 
The whole skymaps obtained from the data sets match, 
not just their statistical properties like mean or variance. 
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FIG. 3: Maps constructed from 10 days of LIGO S5 unfolded and folded data and their differences are plotted here. Row #1 and 
row #2 show the dirty map (left) and its standard deviation map (right) obtained respectively from unfolded and folded data. 
The differences between the top two rows are plotted in row #3. The results obtained from unfolded and folded data are clearly 
small. In row #4 and #5 we show the SNR and clean maps respectively from the unfolded data (left) and their differences 
(right) from the corresponding maps obtained from folded data. The differences are presented in a more quantitative form in 
Table I. Note that all the difference maps are statistically much smaller then the unfolded or folded maps, hence validating the 
folding method and the code. 
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Observation time 

GPS start-end (sec) 

10 calendar days 

860832366-861701598 

100 calendar days 

860832366-869499943 

Full S5 

816065726-877591411 

Real part of Fisher matrix 

2.55 x 

10 -5 

1.02 x 

10" 5 

5.57 x 

10“ 6 

Imaginary parts of Fisher matrix 

3.66 x 

10~ 5 

1.51 x 

10” 5 

7.93 x 

10“ 6 

SpH coefficients of dirty map 

3.34 x 

10" 4 

1.76 x 

10~ 4 

1.92 x 

10“ 4 

SpH coefficients of clean map 

3.44 x 

10" 4 

2.74 x 

10~ 4 

2.62 x 

10“ 4 

Dirty map in pixel space 

2.85 x 

IQ - 4 

1.53 x 

10” 4 

1.67 x 

10“ 4 

Clean map in pixel space 

3.40 x 

10" 4 

2.23 x 

10~ 4 

2.34 x 

10" 4 

Standard deviation map of dirty map 

4.42 x 

10" 6 

2.80 x 

10” 6 

1.82 x 

10“ 6 

SNR map of dirty map 

2.91 x 

10” 4 

1.59 x 

10~ 4 

1.73 x 

10“ 4 


TABLE I: Table of fractional RMS differences between results for different quantities obtained from unfolded and folded data. 
If A and B are two result vectors (e.g., pixel space maps or Spherical Harmonic coefficients) obtained respectively from unfolded 
and folded data, the fractional RMS difference is given by ||B — A||/||A||, where || A|| := y At • A. The differences in the table 
are much smaller than 1, implying excellent match between the results. Some of the maps and differences for 10 days’ data are 
shown in Figure 3. 


So the match implies that the signals have been com¬ 
bined with the right phases in the folded data. There¬ 
fore, performing simulations with injected signals would 
be a redundant exercise and hence not pursued here. 

The residual difference of ~ 10 -4 in the dirty map 
is caused by misalignment between the SID and FSID 
frames. This difference is much smaller than the stan¬ 
dard deviation, as evident from the SNR difference, and 
hence does not need any special attention in practice. 
However, since they arise from two avoidable reasons, we 
describe the cause and remedy for these misalignment 
for the sake of completeness. In our implementation we 
compute the Greenwich mean sidereal time for the mid¬ 
segment GPS timestamp of an SID frame and fold that 
data into an FSID frame which has the closest sidereal 
time. If all the SID frames were separated by multiples of 
26 sec, we could perfectly align every SID frame with its 
corresponding FSID frame, by choosing an appropriate 
offset (between 0 — 26 sec) for the start time of the first 
FSID frame. However, the LIGO S5 SID frames are sepa¬ 
rated by multiples of 26 sec only for contiguous stretches 
of data, spanning over a maximum of few days. This 
does not hold for many days worth of data constituted by 
non-contiguous segments. Hence, in the later case, SID 
and FSID frames can not be perfectly aligned for every 
segment. One can, however, align the whole dataset eas¬ 
ily by dropping on the average ~ 5 minutes of data per 
day. Secondly, a misalignment is also caused by the fact 
that one sidereal day is slightly longer than 86164 sec 
by ~ 0.1 sec. This implies that there will be an align¬ 
ment shift of ~ 0.1 sec per day, which can accumulate 
up to 13 sec then it resets to —13 sec in our implemen¬ 
tation. This error can be vastly reduced by choosing the 
segment duration more precisely, say, 52000054/rs. Then 
the shift per sidereal day (given by the reminder when 
one mean sidereal day, 86164.090530833 sec, is divided 
by the segment duration) will be 1052.8^s, that is, less 
than a second for the whole duration of S5 run. Note 


that the baselines formed by the present and upcoming 
ground based laser interferometric detectors have an an¬ 
gular resolution of a few degrees (~ 0.1 rad) [41]. In 
the worst case, an average timing jitter of ST = 13 sec 
corresponds to an angular shift of ~ 2nST/T s « 0.001, 
which is ~ 1% of the angular resolution of the radiome¬ 
ters considered here and hence these timing inaccuracies 
are practically negligible. 

In summary, the results demonstrate that the folding 
method and the codes are working correctly. The main 
difference was that to analyse full S5 data, we processed 
nearly a million SID frames, while only 3314 FSID frames 
were analysed to get the same result. This reduced the 
computation cost for processing and post-processing by 
a factor of ~ 300. 


V. CONCLUSIONS 

We have formulated and implemented an algorithm to 
fold entire datasets for pairs of GW detectors to only 
one sidereal day’s data, which is of enormous advantage 
to GW radiometer analyses. We developed a parallel 
pipeline to implement this method on data from ground- 
based interferometers and applied to LIGO’s fifth science 
run data. The skymaps obtained from folded and un¬ 
folded data through a standard LIGO anisotropic SGWB 
search pipeline showed excellent match, thus convinc¬ 
ingly validating the method and the implementation. 

Folding follows from an algebraic identity, hence the 
results are exact. Even for incorporating overlapping 
windows, the tiny error arising from the inversion of a 
tridiagonal matrix is inherent to the analysis, folded data 
merely incorporates that approximation with matching 
precision. Also, folding process takes care of the correc¬ 
tion, so that, folded data can be analysed in a straight¬ 
forward way without repeating this complex procedure, 
as if there are no overlapping windows. 








10 


The folding process, though fairly straightforward in 
principle, does capture the complications involved in real 
data analysis. For instance, it does not assume stationar- 
ity of noise, hence no special care is necessary to account 
for variation in noise power spectrum, say, on different 
days of the week or at different times of the day. Folded 
data also account for the overlapping Hanning windows 
applied to data to reduce spectral leakage. The frequency 
masks, often used in analyses to remove the noisy bits of 
data, can be readily applied to folded data. One can 
also remove non-stationary data [43] while folding, by 
discarding those frequency bins of an unfolded segments, 
whose variances are significantly (say, more than 20 %) 
different from those in the adjacent segments. 

The advantages one can derive out of the folded data 
is many-fold: 

1. Efficiency: If n sidereal days’ data is folded to one 
sidereal day, computational cost reduces by a factor 
of n. The one time folding step requires negligible 
amount of computing compared to the total cost 
of running different radiometer analysis few times 
each on the same data. Hence for few years worth 
of data, the computational cost reduction could be 
by a factor of as much as ~ 1000. Moreover, the 
folding step already accounts for overlapping win¬ 
dow correction and other preprocessing steps, so 
there is further reduction in computation cost. 

2. Portability: Since folded data volume is reduced by 
a factor of n, it will be convenient to transport data 
from computer to computer. For standard stochas¬ 
tic analysis, folded data size is only ~ 1.3 gigabytes, 
which comfortably fits on a USB memory stick. 

3. Convenience: The above two points imply that 
folded data for standard searches can be processed 
in a personal computer. This will allow multiple ex¬ 
perimentation with current analyses and will make 
it easier to develop new search algorithms. In ad¬ 
dition, cross-correlation based searches which were 
not conceivable due to computational limitations 
have now become possible to perform. 

4. Robustness: Since the preprocessing steps are the 
same for all analyses with folded data, better con¬ 
sistency check between different searches would be 
possible. 

5. Modularity: Since the folding part is separate from 
the search part, it will be possible to do the disk 


I/O intensive folding part in a low level language 
(e. g., C) and the complicated algebra of filtering 
for different searches in more high level language 
like MATLAB or Python. 

6 . Management: Folded data duration and volume are 
independent of the total observation time, hence 
the computation cost, storage etc. are independent 
of the number of days of observation. Which can 
be of help in planning computation budget and de¬ 
signing of parallel pipelines. 

The enormous efficiency that folded data brings will 
allow one to perform different kinds of neat analyses. 
For instance, an all sky targeted search for narrowband 
sources over a frequency range of few hundred Hz may 
become feasible. One may be able to push the frequency 
bin size down in order to efficiently search for narrow 
line emissions [ 17] . We caution the reader that we do not 
fold single detector’s data, which could perhaps provide 
enormous efficiency to search for purely monochromatic 
(coherent) signals. The folding scheme described here ap¬ 
plies exclusively to analyses where data from two different 
detectors are cross-correlated, which can sometimes be 
used to optimally or sub-optimally search for narrowband 
sources. On the other hand, for long-duration broadband 
sources, folding may be coupled with coarse-graining [ 8] 
for even more efficiency. In conclusion, GW radiometer is 
the optimal analysis for detecting long duration unknown 
sources. Folded data will enable enough experimentation 
with the radiometer analysis to look for such sources. 
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